Combination Methods for Crosslingual Web Retrieval

نویسندگان

  • Jaap Kamps
  • Maarten de Rijke
  • Börkur Sigurbjörnsson
چکیده

We investigate a range of crosslingual web retrieval tasks using the test suite of the CLEF 2005 WebCLEF track, which features a stream of known-item topics in various languages. Our main findings are: (i) straightforward indexing and retrieval is effective for mixed monolingual web retrieval; (ii) standard machine translation methods are effective for bilingual web retrieval; but (iii) standard combination methods are ineffective for multilingual web retrieval; we analyze the failure and suggest an alternative Z-score normalization that leads to effective multilingual retrieval results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic crosslingual thesaurus generated from the Hong Kong SAR Police Department Web corpus for crime analysis

based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. We also introduce an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, wh...

متن کامل

Crosslingual Ontology-Based Document Retrieval

An approach for crosslingual ontology-based document retrieval has been devised and is being implemented. It allows the user to enter a query in any language that is part of the system and retrieve documents in selected languages. A domain ontology and term-concept lexicons, containing synonymous terms where applicable, are used to overcome discrepancies between the search query and the words o...

متن کامل

Index Combinations and Query Reformulations for Mixed Monolingual Web Retrieval

We examine the effectiveness on the multilingual WebCLEF 2006 test set of light-weight methods that have proved successful in other web retrieval settings: combinations of document representations on the one hand and query reformulation techniques on the other. We investigate a range of approaches to crosslingual web retrieval using the test suite of the mixed monolingual CLEF 2006 WebCLEF trac...

متن کامل

A Patient Support System based on Crosslingual IR and Semi-supervised Learning

Even though patients are now using the Web to get useful information, the latest medical information is not available in most languages, except English. Even if patients want to learn about current treatments, they do not want to read English documents filled with technical terms. To mitigate this situation, we are building a patient support system that combines crosslingual information retriev...

متن کامل

Discovering Parallel Text from the World Wide Web

Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and multilingual text mining. Constructing a parallel corpus requires effective alignment of parallel documents. In this paper, we develop a parallel page identification system for identifying and aligning parallel documents ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005